The JaRAP Experimental System Of Japanese-Russian Automatic Translation
نویسندگان
چکیده
The paper is the first report on the experimental MT system developed as part of the Japanese-Russian Automatic tra.aslation Project (JaRAP). The system follows the transfer approach to MT. Limited so far to lexico-morphologieal processing, it is seen as a foundation for more ambitious linguistic research. The system is implemented on IBM PC, MS DOS, in Arity Prolog (analysis and transfer) and Turbo Pascal (synthesis). 1 Theoret ica l background The development of the Jall.AP experimental system was preceded by a long period of purely theoretic research into various aspects of natural language ~'td its functioning in translatio~t (see, e.g., (Shalyaplna~lO80a,1980b,1988)). Some of the basic principles which have evolved from this research may be summarized ~m follows. (1) The most adequate scheme for simulating human translation ~ctlvity is doubtless the transfer one. (2) The level of transfer and the volume of structural and semantic information explicitly represented at this level should be determined experimentally as a compromise between the demands for translation adequacy under the given conditions and the advantages of "shortcuts" permitted by the superficial correspondences between the languages concerned. (3) Semantics is not in itself a level of linguistic representation, but rather part of linguistic description at any level of representation of linguistic units. (4) In its semantic aspects, syntax is dependent on lexicon to a greater extent than vice versa. (5) A model aimed at faithful simulation of linguistic performance should make explicit use of the factor of linguistic normativity, this being, at least in prospect~ a building block for "self-tuning" functions as an analogue for human learning capabilities. An approach best suited for effectuating these principles seeirm to be that of relying on a lexiconoriented lingware framework of a special kind. Within this framework, eTttrles of a uIfiform structure may be provided, besides lexlcal units, also for morphological categories, fanctlon elements (inchdlug punctuation), and all kinds of grammatical features, while syntagmatics of all levels may be presented in terms of valencies of those levels, assigned to the corresponding lexical or grasmn.atlca] units in their entries. The JaFtAP experimental system is meaatt to incorporate this approach. In ~cordance with the transfer scheme of translation, the system is made up of three major components: the Japanese analysis component, the Japanesc-Rl~ssian transfer component, and the Russian synthesis (generation) component. It is implemented on IBM PC, MS DOS, its programming tools being Arity Prolog for analysis and traamfer~ and Turbo Pascal for synthesis. 2 The current version of the J a R A P s y s t e m At present, the JaRAP system does not go far beyond the i~fitial lexico-morphological level of text processing (though some provision has already been made for further stages of its development see See.3). The analys is component of the system performs so far three main groups of operations: segmentation of the input Japanese texts into graphicomorphological (CAM-) elements (stems aJtd suffixes of Japanese words); processing s] tranMationally idiomatic (TI-) csm6inatisns of GM-elements; and lezieo-rnsrpholofical (LM-) analy~i.¢ of the resulting sequence of (]M-elements aatd the_Jr TLeomblnations. Segmenta t ion is accomplished in two steps. First, the input text (= the input sequence of k~na and kanjl kodes) is broken up into fragments by conteztual delimiters eertMn to denote word or morph boundaries (e.g., punctuation marks, the occurrence of a k~tat~na symbol after a hlragana one or vice verB% etc,). Then the fragmgnts obtained are segmented into GM-elements by mea~ts of dictionary
منابع مشابه
Searching a Russian Document Collection using English, Chinese and Japanese Queries
As in CLEF 2003, Berkeley experimented with the CLEF Russian Izvestia document collection with monolingual and bilingual runs for the Russian collection. For CLEF 2004 we also experimented with Chinese and Japanese as topic languages, using English as the ‘pivot’ language. For bilingual retrieval our approaches were query translation (for English as a topic language) and ‘fast’ document transla...
متن کاملResearch on automatic translation at the Harvard Computation Laboratory
An automatic Russian-English dictionary of electronics and mathematics, comprising over 10,000 distinct Russian words represented by 22,000 stem entries recorded on magnetic tape, is now being used for the automatic processing of Russian scientific and technical texts. The mode of operation of the dictionary is described, and samples of the dictionary output products are shown. Immediate practi...
متن کاملEuropean Atomic Energy Community - Euratom 1968 Meeting of European Librarians Working in the Nuclear Field
The applications of the Russian-English MT system at CETIS as an instrument for information and documentation are presented. Four principal points are discussed: the Russian-English MT service at the request of the customers; current awareness with the automatic translation of the tables of contents of Russian periodicals; SDI with automatically translated abstracts from Russian periodicals; au...
متن کاملALT-J/C - a prototype Japanese-to-Chinese automatic language translation system
This paper describes a prototype Japanese-to-Chinese automatic language translation system. ALT-J/C (Automatic Language Translator Japanese-to-Chinese) is a semantic transfer based system, which is based on ALT-J/E (a Japanese-to-English system), but written to cope with Unicode. It is also designed to cope with constructions specific to Chinese. This system has the potential to become a framew...
متن کاملSession 10: PROGRAMMING THE LOGIC OF AUTOMATIC FORMULA SYNTHESIS
Researchers in automatic translation have often been asked whether it might be possible to derive translation algorithms automatically--through a machine-programmed comparison of texts in both translated and untranslated versions. Suppose, for example, that parallel bodies of Russian and English scientific text are supplied as simultaneous inputs to a machine; can the machine be somehow instruc...
متن کامل